Personal Sense and Idiolect: Combining Authorship Attribution and Opinion Analysis

نویسندگان

  • Polina Panicheva
  • John Cardiff
  • Paolo Rosso
چکیده

Subjectivity analysis and authorship attribution are very popular areas of research. However, work in these two areas has been done separately. Our conjecture is that by combining information about subjectivity in texts and authorship, the performance of both tasks can be improved. In the paper a personalized approach to opinion mining is presented, in which the notions of personal sense and idiolect are introduced; the approach is applied to the polarity classification task. It is assumed that different authors express their private states in text individually, and opinion mining results could be improved by analyzing texts by different authors separately. The hypothesis is tested on a corpus of movie reviews by ten authors. The results of applying the personalized approach to opinion mining are presented, confirming that the approach increases the performance of the opinion mining task. Automatic authorship attribution is further applied to model the personalized approach, classifying documents by their assumed authorship. Although the automatic authorship classification imposes a number of limitations on the dataset for further experiments, after overcoming these issues the authorship attribution technique modeling the personalized approach confirms the increase over the baseline with no authorship information used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Idiolect-based Identity Disclosure and Authorship Attribution in Web-based Social Spaces

In this paper, we inspect new possible methods of Web surveillance combining web mining with sociolinguistic and semiotic related knowledge of human discourse. We first give an overview of telecommunication surveillance methods and systems, with focus on the Internet, and we describe the legal issues involved in Web or Internet communications investigations. We put the emphasis on identity disc...

متن کامل

Identifying subjective statements in news titles using a personal sense annotation framework

Subjective language contains information about private states. The goal of subjective language identification is to identify that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense", has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal ex...

متن کامل

Can Anonymous Posters on Medical Forums be Reidentified?

BACKGROUND Participants in medical forums often reveal personal health information about themselves in their online postings. To feel comfortable revealing sensitive personal health information, some participants may hide their identity by posting anonymously. They can do this by using fake identities, nicknames, or pseudonyms that cannot readily be traced back to them. However, individual writ...

متن کامل

Authorship Attribution Using Text Distortion

Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...

متن کامل

Clustering by Authorship Within and Across Documents

The vast majority of previous studies in authorship attribution assume the existence of documents (or parts of documents) labeled by authorship to be used as training instances in either closed-set or open-set attribution. However, in several applications it is not easy or even possible to find such labeled data and it is necessary to build unsupervised attribution models that are able to estim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010